AITopics | contextualized representation

Collaborating Authors

contextualized representation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Contextualized Token Discrimination for Speech Search Query Correction

Lu, Junyu, Jiang, Di, Hong, Mengze, Wei, Victor Junqiu, Guo, Qintian, Su, Zhiyang

arXiv.org Artificial IntelligenceSep-5-2025

Query spelling correction is an important function of modern search engines since it effectively helps users express their intentions clearly. With the growing popularity of speech search driven by Automated Speech Recognition (ASR) systems, this paper introduces a novel method named Contextualized Token Discrimination (CTD) to conduct effective speech query correction. In CTD, we first employ BERT to generate token-level contextualized representations and then construct a composition layer to enhance semantic information. Finally, we produce the correct query according to the aggregated token representation, correcting the incorrect tokens by comparing the original token representations and the contextualized representations. Extensive experiments demonstrate the superior performance of our proposed method across all metrics, and we further present a new benchmark dataset with erroneous ASR transcriptions to offer comprehensive evaluations for audio query correction.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.04393

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

Improving Data and Parameter Efficiency of Neural Language Models Using Representation Analysis

Jukić, Josip

arXiv.org Artificial IntelligenceJul-17-2025

This thesis addresses challenges related to data and parameter efficiency in neural language models, with a focus on representation analysis and the introduction of new optimization techniques. The first part examines the properties and dynamics of language representations within neural models, emphasizing their significance in enhancing robustness and generalization. It proposes innovative approaches based on representation smoothness, including regularization strategies that utilize Jacobian and Hessian matrices to stabilize training and mitigate sensitivity to input perturbations. The second part focuses on methods to significantly enhance data and parameter efficiency by integrating active learning strategies with parameter-efficient fine-tuning, guided by insights from representation smoothness analysis. It presents smoothness-informed early-stopping techniques designed to eliminate the need for labeled validation sets and proposes innovative combinations of active learning and parameter-efficient fine-tuning to reduce labeling efforts and computational resources. Extensive experimental evaluations across various NLP tasks demonstrate that these combined approaches substantially outperform traditional methods in terms of performance, stability, and efficiency. The third part explores weak supervision techniques enhanced by in-context learning to effectively utilize unlabeled data, further reducing dependence on extensive labeling. It shows that using in-context learning as a mechanism for weak supervision enables models to better generalize from limited labeled data by leveraging unlabeled examples more effectively during training. Comprehensive empirical evaluations confirm significant gains in model accuracy, adaptability, and robustness, especially in low-resource settings and dynamic data environments.

large language model, machine learning, natural language, (27 more...)

arXiv.org Artificial Intelligence

2507.12004

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.27)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.92)
Law (0.67)
Education > Curriculum (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(6 more...)

Add feedback

Local and Global Contexts for Conversation

Lin, Zuoquan, Shen, Xinyi

arXiv.org Artificial IntelligenceJan-30-2024

The context in conversation is the dialog history crucial for multi-turn dialogue. Learning from the relevant contexts in dialog history for grounded conversation is a challenging problem. Local context is the most neighbor and more sensitive to the subsequent response, and global context is relevant to a whole conversation far beyond neighboring utterances. Currently, pretrained transformer models for conversation challenge capturing the correlation and connection between local and global contexts. We introduce a local and global conversation model (LGCM) for general-purpose conversation in open domain. It is a local-global hierarchical transformer model that excels at accurately discerning and assimilating the relevant contexts necessary for generating responses. It employs a local encoder to grasp the local context at the level of individual utterances and a global encoder to understand the broader context at the dialogue level. The seamless fusion of these locally and globally contextualized encodings ensures a comprehensive comprehension of the conversation. Experiments on popular datasets show that LGCM outperforms the existing conversation models on the performance of automatic metrics with significant margins.

computational linguistic, encoder, utterance, (15 more...)

arXiv.org Artificial Intelligence

2401.17588

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

In-Context Probing: Toward Building Robust Classifiers via Probing Large Language Models

Amini, Afra, Ciaramita, Massimiliano

arXiv.org Artificial IntelligenceDec-22-2023

Large language models are able to learn new tasks in context, where they are provided with instructions and a few annotated examples. However, the effectiveness of in-context learning is dependent on the provided context, and the performance on a downstream task can vary considerably, depending on the instruction. Importantly, such dependency on the context can surface in unpredictable ways, e.g., a seemingly more informative instruction might lead to a worse performance. In this paper, we propose an alternative approach, which we term In-Context Probing (ICP). Similar to in-context learning, we contextualize the representation of the input with an instruction, but instead of decoding the output prediction, we probe the contextualized representation to predict the label. Through a series of experiments on a diverse set of classification tasks, we show that in-context probing is significantly more robust to changes in instructions. We further show that ICP performs competitive or superior to finetuning and can be particularly helpful to build classifiers on top of smaller models, with less than a hundred training examples.

flan, icl, instruction, (15 more...)

arXiv.org Artificial Intelligence

2305.14171

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers

Mohebbi, Hosein, Chrupała, Grzegorz, Zuidema, Willem, Alishahi, Afra

arXiv.org Artificial IntelligenceOct-15-2023

Transformers have become a key architecture in speech processing, but our understanding of how they build up representations of acoustic and linguistic structure is limited. In this study, we address this gap by investigating how measures of 'context-mixing' developed for text models can be adapted and applied to models of spoken language. We identify a linguistic phenomenon that is ideal for such a case study: homophony in French (e.g. livre vs livres), where a speech recognition model has to attend to syntactic cues such as determiners and pronouns in order to disambiguate spoken words with identical pronunciations and transcribe them while respecting grammatical agreement. We perform a series of controlled experiments and probing analyses on Transformer-based speech models. Our findings reveal that representations in encoder-only models effectively incorporate these cues to identify the correct transcription, whereas encoders in encoder-decoder models mainly relegate the task of capturing contextual dependencies to decoder modules.

cue word, encoder, representation, (15 more...)

arXiv.org Artificial Intelligence

2310.09925

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Tuscany > Florence (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)

Add feedback

INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation

Zhu, Wenhao, Xu, Jingjing, Huang, Shujian, Kong, Lingpeng, Chen, Jiajun

arXiv.org Artificial IntelligenceJun-10-2023

Neural machine translation has achieved promising results on many translation tasks. However, previous studies have shown that neural models induce a non-smooth representation space, which harms its generalization results. Recently, kNN-MT has provided an effective paradigm to smooth the prediction based on neighbor representations during inference. Despite promising results, kNN-MT usually requires large inference overhead. We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters. The new parameters are then used to refresh the whole representation datastore to get new kNN knowledge asynchronously. This loop keeps running until convergence. Experiments on four benchmark datasets show that \method achieves average gains of 1.99 COMET and 1.0 BLEU, outperforming the state-of-the-art kNN-MT system with 0.02x memory space and 1.9x inference speedup.

artificial intelligence, natural language, representation, (16 more...)

arXiv.org Artificial Intelligence

2306.06381

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Toward Adversarial Training on Contextualized Language Representation

Wu, Hongqiu, Liu, Yongxiang, Shi, Hanwen, Zhao, Hai, Zhang, Min

arXiv.org Artificial IntelligenceMay-8-2023

Beyond the success story of adversarial training (AT) in the recent text domain on top of pre-trained language models (PLMs), our empirical study showcases the inconsistent gains from AT on some tasks, e.g. commonsense reasoning, named entity recognition. This paper investigates AT from the perspective of the contextualized language representation outputted by PLM encoders. We find the current AT attacks lean to generate sub-optimal adversarial examples that can fool the decoder part but have a minor effect on the encoder. However, we find it necessary to effectively deviate the latter one to allow AT to gain. Based on the observation, we propose simple yet effective \textit{Contextualized representation-Adversarial Training} (CreAT), in which the attack is explicitly optimized to deviate the contextualized representation of the encoder. It allows a global optimization of adversarial examples that can fool the entire model. We also find CreAT gives rise to a better direction to optimize the adversarial examples, to let them less sensitive to hyperparameters. Compared to AT, CreAT produces consistent performance gains on a wider range of tasks and is proven to be more effective for language pre-training where only the encoder part is kept for downstream tasks. We achieve the new state-of-the-art performances on a series of challenging benchmarks, e.g. AdvGLUE (59.1 $ \rightarrow $ 61.1), HellaSWAG (93.0 $ \rightarrow $ 94.9), ANLI (68.1 $ \rightarrow $ 69.3).

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.04557

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(18 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

Chen, Li-Wei, Rudnicky, Alexander

arXiv.org Artificial IntelligenceFeb-21-2023

While Wav2Vec 2.0 has been proposed for speech recognition (ASR), it can also be used for speech emotion recognition (SER); its performance can be significantly improved using different fine-tuning strategies. Two baseline methods, vanilla fine-tuning (V-FT) and task adaptive pretraining (TAPT) are first presented. We show that V-FT is able to outperform state-of-the-art models on the IEMOCAP dataset. TAPT, an existing NLP fine-tuning strategy, further improves the performance on SER. We also introduce a novel fine-tuning method termed P-TAPT, which modifies the TAPT objective to learn contextualized emotion representations. Experiments show that P-TAPT performs better than TAPT, especially under low-resource settings. Compared to prior works in this literature, our top-line system achieved a 7.4\% absolute improvement in unweighted accuracy (UA) over the state-of-the-art performance on IEMOCAP. Our code is publicly available.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2110.06309

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language (0.69)
(2 more...)

Add feedback

Reconstruction Probing

Kim, Najoung, Khilnani, Jatin, Warstadt, Alex, Qaddoumi, Abed

arXiv.org Artificial IntelligenceDec-21-2022

We propose reconstruction probing, a new analysis method for contextualized representations based on reconstruction probabilities in masked language models (MLMs). This method relies on comparing the reconstruction probabilities of tokens in a given sequence when conditioned on the representation of a single token that has been fully contextualized and when conditioned on only the decontextualized lexical prior of the model. This comparison can be understood as quantifying the contribution of contextualization towards reconstruction -- the difference in the reconstruction probabilities can only be attributed to the representational change of the single token induced by contextualization. We apply this analysis to three MLMs and find that contextualization boosts reconstructability of tokens that are close to the token being reconstructed in terms of linear and syntactic distance. Furthermore, we extend our analysis to finer-grained decomposition of contextualized representations, and we find that these boosts are largely attributable to static and positional embeddings at the input layer.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.10792

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > New York (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

Gekhman, Zorik, Zverinski, Dina, Mallinson, Jonathan, Beryozkin, Genady

arXiv.org Artificial IntelligenceOct-26-2022

ASR Error Detection (AED) models aim to post-process the output of Automatic Speech Recognition (ASR) systems, in order to detect transcription errors. Modern approaches usually use text-based input, comprised solely of the ASR transcription hypothesis, disregarding additional signals from the ASR model. Instead, we propose to utilize the ASR system's word-level confidence scores for improving AED performance. Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model's encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation. Our experiments show the benefits of ASR confidence scores for AED, their complementary effect over the textual signal, as well as the effectiveness and robustness of ACE for combining these signals. To foster further research, we publish a novel AED dataset consisting of ASR outputs on the LibriSpeech corpus with annotated transcription errors.

confidence score, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2203.07172

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback